84 research outputs found

    Patterns of base composition within and between animal mitochondrial genomes

    Get PDF
    Nucleotide composition of a DNA molecule is a product of base substitution. Variation in nucleotide composition indicates a change in the pattern of substitution at either the level of the underlying mutational spectrum or the constraints imposed by natural selection. This work explores patterns of nucleotide usage within and between animal mitochondrial genomes and the evolutionary mechanisms that have shaped these patterns. Fourfold degenerate sites are expected to reflect the underlying mutational spectrum. Three simple measures of compositional bias, taking into account the strand-specific nature of nucleotide distribution in mtDNA, reveal considerable variation among fourfold degenerate sites of metazoan mitochondrial genomes. Log-linear analysis of intramolecular compositional patterns of mammalian mtDNA demonstrates that fourfold degenerate sites from even a single strand of the genome are not homogeneous. Rather, base composition varies among codon families and around the circular genome. A companion analysis of two additional taxonomic groups, molluscs and insects, also reveals compositional variation among codon families and between strands. The observed intramolecular variation cannot be explained solely by a simple strand-specific mutational pressure, but requires either a contextual bias to the mutational process or translational level natural selection as well. First and second codon position base composition and amino acid frequencies regressed on fourfold degenerate site composition show how mutational biases at the DNA level translate to amino acid biases in mitochondrial proteins

    Progressive Mauve: Multiple alignment of genomes with gene flux and rearrangement

    Full text link
    Multiple genome alignment remains a challenging problem. Effects of recombination including rearrangement, segmental duplication, gain, and loss can create a mosaic pattern of homology even among closely related organisms. We describe a method to align two or more genomes that have undergone large-scale recombination, particularly genomes that have undergone substantial amounts of gene gain and loss (gene flux). The method utilizes a novel alignment objective score, referred to as a sum-of-pairs breakpoint score. We also apply a probabilistic alignment filtering method to remove erroneous alignments of unrelated sequences, which are commonly observed in other genome alignment methods. We describe new metrics for quantifying genome alignment accuracy which measure the quality of rearrangement breakpoint predictions and indel predictions. The progressive genome alignment algorithm demonstrates markedly improved accuracy over previous approaches in situations where genomes have undergone realistic amounts of genome rearrangement, gene gain, loss, and duplication. We apply the progressive genome alignment algorithm to a set of 23 completely sequenced genomes from the genera Escherichia, Shigella, and Salmonella. The 23 enterobacteria have an estimated 2.46Mbp of genomic content conserved among all taxa and total unique content of 15.2Mbp. We document substantial population-level variability among these organisms driven by homologous recombination, gene gain, and gene loss. Free, open-source software implementing the described genome alignment approach is available from http://gel.ahabs.wisc.edu/mauve .Comment: Revision dated June 19, 200

    Genome-wide detection and analysis of homologous recombination among sequenced strains of Escherichia coli

    Get PDF
    BACKGROUND: Comparisons of complete bacterial genomes reveal evidence of lateral transfer of DNA across otherwise clonally diverging lineages. Some lateral transfer events result in acquisition of novel genomic segments and are easily detected through genome comparison. Other more subtle lateral transfers involve homologous recombination events that result in substitution of alleles within conserved genomic regions. This type of event is observed infrequently among distantly related organisms. It is reported to be more common within species, but the frequency has been difficult to quantify since the sequences under comparison tend to have relatively few polymorphic sites. RESULTS: Here we report a genome-wide assessment of homologous recombination among a collection of six complete Escherichia coli and Shigella flexneri genome sequences. We construct a whole-genome multiple alignment and identify clusters of polymorphic sites that exhibit atypical patterns of nucleotide substitution using a random walk-based method. The analysis reveals one large segment (approximately 100 kb) and 186 smaller clusters of single base pair differences that suggest lateral exchange between lineages. These clusters include portions of 10% of the 3,100 genes conserved in six genomes. Statistical analysis of the functional roles of these genes reveals that several classes of genes are over-represented, including those involved in recombination, transport and motility. CONCLUSION: We demonstrate that intraspecific recombination in E. coli is much more common than previously appreciated and may show a bias for certain types of genes. The described method provides high-specificity, conservative inference of past recombination events

    The evolution of metabolic networks of E. coli

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Despite the availability of numerous complete genome sequences from <it>E. coli </it>strains, published genome-scale metabolic models exist only for two commensal <it>E. coli </it>strains. These models have proven useful for many applications, such as engineering strains for desired product formation, and we sought to explore how constructing and evaluating additional metabolic models for <it>E. coli </it>strains could enhance these efforts.</p> <p>Results</p> <p>We used the genomic information from 16 <it>E. coli </it>strains to generate an <it>E. coli </it>pangenome metabolic network by evaluating their collective 76,990 ORFs. Each of these ORFs was assigned to one of 17,647 ortholog groups including ORFs associated with reactions in the most recent metabolic model for <it>E. coli </it>K-12. For orthologous groups that contain an ORF already represented in the MG1655 model, the gene to protein to reaction associations represented in this model could then be easily propagated to other <it>E. coli </it>strain models. All remaining orthologous groups were evaluated to see if new metabolic reactions could be added to generate a pangenome-scale metabolic model (iEco1712_pan). The pangenome model included reactions from a metabolic model update for <it>E. coli </it>K-12 MG1655 (iEco1339_MG1655) and enabled development of five additional strain-specific genome-scale metabolic models. These additional models include a second K-12 strain (iEco1335_W3110) and four pathogenic strains (two enterohemorrhagic <it>E. coli </it>O157:H7 and two uropathogens). When compared to the <it>E. coli </it>K-12 models, the metabolic models for the enterohemorrhagic (iEco1344_EDL933 and iEco1345_Sakai) and uropathogenic strains (iEco1288_CFT073 and iEco1301_UTI89) contained numerous lineage-specific gene and reaction differences. All six <it>E. coli </it>models were evaluated by comparing model predictions to carbon source utilization measurements under aerobic and anaerobic conditions, and to batch growth profiles in minimal media with 0.2% (w/v) glucose. An ancestral genome-scale metabolic model based on conserved ortholog groups in all 16 <it>E. coli </it>genomes was also constructed, reflecting the conserved ancestral core of <it>E. coli </it>metabolism (iEco1053_core). Comparative analysis of all six strain-specific <it>E. coli </it>models revealed that some of the pathogenic <it>E. coli </it>strains possess reactions in their metabolic networks enabling higher biomass yields on glucose. Finally the lineage-specific metabolic traits were compared to the ancestral core model predictions to derive new insight into the evolution of metabolism within this species.</p> <p>Conclusion</p> <p>Our findings demonstrate that a pangenome-scale metabolic model can be used to rapidly construct additional <it>E. coli </it>strain-specific models, and that quantitative models of different strains of <it>E. coli </it>can accurately predict strain-specific phenotypes. Such pangenome and strain-specific models can be further used to engineer metabolic phenotypes of interest, such as designing new industrial <it>E. coli </it>strains.</p

    Gene Ontology annotation highlights shared and divergent pathogenic strategies of type III effector proteins deployed by the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic Escherichia coli strains

    Get PDF
    Genome-informed identification and characterization of Type III effector repertoires in various bacterial strains and species is revealing important insights into the critical roles that these proteins play in the pathogenic strategies of diverse bacteria. However, non-systematic discipline-specific approaches to their annotation impede analysis of the accumulating wealth of data and inhibit easy communication of findings among researchers working on different experimental systems. The development of Gene Ontology (GO) terms to capture biological processes occurring during the interaction between organisms creates a common language that facilitates cross-genome analyses. The application of these terms to annotate type III effector genes in different bacterial species – the plant pathogen Pseudomonas syringae pv tomato DC3000 and animal pathogenic strains of Escherichia coli – illustrates how GO can effectively describe fundamental similarities and differences among different gene products deployed as part of diverse pathogenic strategies. In depth descriptions of the GO annotations for P. syringae pv tomato DC3000 effector AvrPtoB and the E. coli effector Tir are described, with special emphasis given to GO capability for capturing information about interacting proteins and taxa. GO-highlighted similarities in biological process and molecular function for effectors from additional pathosystems are also discussed

    Evolution of the metabolic and regulatory networks associated with oxygen availability in two phytopathogenic enterobacteria

    Get PDF
    <p>Abstract</p> <p>Background</p> <p><it>Dickeya dadantii </it>and <it>Pectobacterium atrosepticum </it>are phytopathogenic enterobacteria capable of facultative anaerobic growth in a wide range of O<sub>2 </sub>concentrations found in plant and natural environments. The transcriptional response to O<sub>2 </sub>remains under-explored for these and other phytopathogenic enterobacteria although it has been well characterized for animal-associated genera including <it>Escherichia coli </it>and <it>Salmonella enterica</it>. Knowledge of the extent of conservation of the transcriptional response across orthologous genes in more distantly related species is useful to identify rates and patterns of regulon evolution. Evolutionary events such as loss and acquisition of genes by lateral transfer events along each evolutionary branch results in lineage-specific genes, some of which may have been subsequently incorporated into the O<sub>2</sub>-responsive stimulon. Here we present a comparison of transcriptional profiles measured using densely tiled oligonucleotide arrays for two phytopathogens, <it>Dickeya dadantii </it>3937 and <it>Pectobacterium atrosepticum </it>SCRI1043, grown to mid-log phase in MOPS minimal medium (0.1% glucose) with and without O<sub>2</sub>.</p> <p>Results</p> <p>More than 7% of the genes of each phytopathogen are differentially expressed with greater than 3-fold changes under anaerobic conditions. In addition to anaerobic metabolism genes, the O<sub>2 </sub>responsive stimulon includes a variety of virulence and pathogenicity-genes. Few of these genes overlap with orthologous genes in the anaerobic stimulon of <it>E. coli</it>. We define these as the conserved core, in which the transcriptional pattern as well as genetic architecture are well preserved. This conserved core includes previously described anaerobic metabolic pathways such as fermentation. Other components of the anaerobic stimulon show variation in genetic content, genome architecture and regulation. Notably formate metabolism, nitrate/nitrite metabolism, and fermentative butanediol production, differ between <it>E. coli </it>and the phytopathogens. Surprisingly, the overlap of the anaerobic stimulon between the phytopathogens is also relatively small considering that they are closely related, occupy similar niches and employ similar strategies to cause disease. There are cases of interesting divergences in the pattern of transcription of genes between <it>Dickeya </it>and <it>Pectobacterium </it>for virulence-associated subsystems including the type VI secretion system (T6SS), suggesting that fine-tuning of the stimulon impacts interaction with plants or competing microbes.</p> <p>Conclusions</p> <p>The small number of genes (an even smaller number if we consider operons) comprising the conserved core transcriptional response to O<sub>2 </sub>limitation demonstrates the extent of regulatory divergence prevalent in the Enterobacteriaceae. Our orthology-driven comparative transcriptomics approach indicates that the adaptive response in the eneterobacteria is a result of interaction of core (regulators) and lineage-specific (structural and regulatory) genes. Our subsystems based approach reveals that similar phenotypic outcomes are sometimes achieved by each organism using different genes and regulatory strategies.</p

    Reordering contigs of draft genomes using the Mauve Aligner

    Get PDF
    Summary: Mauve Contig Mover provides a new method for proposing the relative order of contigs that make up a draft genome based on comparison to a complete or draft reference genome. A novel application of the Mauve aligner and viewer provides an automated reordering algorithm coupled with a powerful drill-down display allowing detailed exploration of results

    The enterobacterium <i>Trabulsiella odontotermitis</i> presents novel adaptations related to its association with fungus-growing termites

    Get PDF
    Fungus-growing termites rely on symbiotic microorganisms to help break down plant material and to obtain nutrients. Their fungal cultivar, Termitomyces, is the main plant degrader and food source for the termites, while gut bacteria complement Termitomyces in the degradation of foodstuffs, fixation of nitrogen, and metabolism of amino acids and sugars. Due to the community complexity and because these typically anaerobic bacteria can rarely be cultured, little is known about the physiological capabilities of individual bacterial members of the gut communities and their associations with the termite host. The bacterium Trabulsiella odontotermitis is associated with fungus-growing termites, but this genus is generally understudied, with only two described species. Taking diverse approaches, we obtained a solid phylogenetic placement of T. odontotermitis among the Enterobacteriaceae, investigated the physiology and enzymatic profiles of T. odontotermitis isolates, determined the localization of the bacterium in the termite gut, compared draft genomes of two T. odontotermitis isolates to those of their close relatives, and examined the expression of genes relevant to host colonization and putative symbiont functions. Our findings support the hypothesis that T. odontotermitis is a facultative symbiont mainly located in the paunch compartment of the gut, with possible roles in carbohydrate metabolism and aflatoxin degradation, while displaying adaptations to association with the termite host, such as expressing genes for a type VI secretion system which has been demonstrated to assist bacterial competition, colonization, and survival within hosts

    Using Comparative Genomics for Inquiry-Based Learning to Dissect Virulence of Escherichia coli O157:H7 and Yersinia pestis

    Get PDF
    Genomics and bioinformatics are topics of increasing interest in undergraduate biological science curricula. Many existing exercises focus on gene annotation and analysis of a single genome. In this paper, we present two educational modules designed to enable students to learn and apply fundamental concepts in comparative genomics using examples related to bacterial pathogenesis. Students first examine alignments of genomes of Escherichia coli O157:H7 strains isolated from three food-poisoning outbreaks using the multiple-genome alignment tool Mauve. Students investigate conservation of virulence factors using the Mauve viewer and by browsing annotations available at the A Systematic Annotation Package for Community Analysis of Genomes database. In the second module, students use an alignment of five Yersinia pestis genomes to analyze single-nucleotide polymorphisms of three genes to classify strains into biovar groups. Students are then given sequences of bacterial DNA amplified from the teeth of corpses from the first and second pandemics of the bubonic plague and asked to classify these new samples. Learning-assessment results reveal student improvement in self-efficacy and content knowledge, as well as students’ ability to use BLAST to identify genomic islands and conduct analyses of virulence factors from E. coli O157:H7 or Y. pestis. Each of these educational modules offers educators new ready-to-implement resources for integrating comparative genomic topics into their curricula

    CGHScan: finding variable regions using high-density microarray comparative genomic hybridization data

    Get PDF
    BACKGROUND: Comparative genomic hybridization can rapidly identify chromosomal regions that vary between organisms and tissues. This technique has been applied to detecting differences between normal and cancerous tissues in eukaryotes as well as genomic variability in microbial strains and species. The density of oligonucleotide probes available on current microarray platforms is particularly well-suited for comparisons of organisms with smaller genomes like bacteria and yeast where an entire genome can be assayed on a single microarray with high resolution. Available methods for analyzing these experiments typically confine analyses to data from pre-defined annotated genome features, such as entire genes. Many of these methods are ill suited for datasets with the number of measurements typical of high-density microarrays. RESULTS: We present an algorithm for analyzing microarray hybridization data to aid identification of regions that vary between an unsequenced genome and a sequenced reference genome. The program, CGHScan, uses an iterative random walk approach integrating multi-layered significance testing to detect these regions from comparative genomic hybridization data. The algorithm tolerates a high level of noise in measurements of individual probe intensities and is relatively insensitive to the choice of method for normalizing probe intensity values and identifying probes that differ between samples. When applied to comparative genomic hybridization data from a published experiment, CGHScan identified eight of nine known deletions in a Brucella ovis strain as compared to Brucella melitensis. The same result was obtained using two different normalization methods and two different scores to classify data for individual probes as representing conserved or variable genomic regions. The undetected region is a small (58 base pair) deletion that is below the resolution of CGHScan given the array design employed in the study. CONCLUSION: CGHScan is an effective tool for analyzing comparative genomic hybridization data from high-density microarrays. The algorithm is capable of accurately identifying known variable regions and is tolerant of high noise and varying methods of data preprocessing. Statistical analysis is used to define each variable region providing a robust and reliable method for rapid identification of genomic differences independent of annotated gene boundaries
    corecore